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It is important to extract reaction coordinates or order parameters from protein simulations in order to in¬ 
vestigate the local minimum-energy states and the transitions between them. The most popular method to 
obtain such data is principal component analysis, which extracts modes of large conformational fluctuations 
around an average structure. We recently applied relaxation mode analysis for protein systems, which approx¬ 
imately estimates the slow relaxation modes and times from a simulation and enables investigations of the 
dynamic properties underlying the structural fluctuations of proteins. In this study, we apply this relaxation 
mode analysis to extract reaction coordinates for a system in which there are large conformational changes 
such as those commonly observed in protein folding/unfolding. We performed a 750-ns simulation of chigno¬ 
lin protein near its folding transition temperature, and observed many transitions between the most stable, 
misfolded, intermediate, and unfolded states. We then applied principal component analysis and relaxation 
mode analysis to the system. In the relaxation mode analysis, we could automatically extract good reaction 
coordinates. The free-energy surfaces provide a clearer understanding of the transitions not only between 
local minimum-energy states but also between the folded and unfolded states, even though the simulation 
involved large conformational changes. Moreover, we propose a new analysis method called Markov state 
relaxation mode analysis. We applied the new method to states with slow relaxation, which are defined by 
the free-energy surface obtained in the relaxation mode analysis. Finally, the relaxation times of the states 
obtained with a simple Markov state model and the proposed Markov state relaxation mode analysis are 
compared and discussed. 


I. INTRODUCTION 

Biopolymers have flexible structures and show variable 
functions. These functions are derived not only from the 
structure but also from the dynamics of the structural 
fluctuations themselves. Therefore, identifying the dy¬ 
namic properties of the structural fluctuations of biopoly¬ 
mers is important for understanding the interrelationship 
between their movements and functions. Classical molec¬ 
ular dynamics (MD) simulation is a popular and power¬ 
ful method for describing the structure, dynamics, and 
function of biomolecules in microscopic detail. Recent 
technological advances have allowed for simulations to be 
carried out on the timescales of the order of milliseconds 
(see reviews by Refs. 11 ). As longer and larger MD sim¬ 
ulations are now being performed, it has become more 
important to develop methods for extracting the most 
“essential” modes from the trajectory^. Indeed, the de¬ 
velopment of a method to reduce the huge number of 
degrees of freedom of coordinates to a few collective de¬ 
grees of freedom is an active field of theoretical research. 

Principal component analysis (PCA), also called quasi¬ 
harmonic analysis or the essential dynamics method^ 1 —, 
is one of the most popular methods for analyzing the 
static properties of the fluctuations of structures. A sys¬ 
tem’s fluctuations can be described in terms of only a 
few principal components (PCs). Moreover, this method 
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has been widely used to extract the few important col¬ 
lective modes of a biomolecule, which may serve as rel¬ 
evant coordinates of the free-energy surface. However, 
if the simulation involves large conformational changes 
such as folding/unfolding simulations, and the confor¬ 
mational changes between local minimum-energy states 
are small compared with those between the folded and 
unfolded states, it is difficult for PCA to extract the ef¬ 
fective modes or order parameters to identify the local 
minimum-energy states. The root mean square distance 
(RMSD) from a reference structure, the radius of gyra¬ 
tion, or selected distances, among others, are often used 
as order parameters to construct free-energy surfaces and 
analyze the structures obtained from a simulation. The 
selection of these order parameters depends on the sim¬ 
ulation system. To identify the local minimum-energy 
states from the simulation, suitable order parameters 
must be considered, which depend on the simulation sys¬ 
tem. Therefore, given the recent possibility for perform¬ 
ing longer MD simulations, a new analysis method must 
be able to extract suitable order parameters for identify¬ 
ing local minimum-energy states automatically, and for 
investigating the dynamics and kinetics of proteins. 

Relaxation mode analysis (RMA) was developed to in¬ 
vestigate the “dynamic” properties of spin systems^ and 
homopolymer systems^^ and has been applied to vari¬ 
ous polymer systems^ - — to investigate their slow relax¬ 
ation dynamic s 23 ! 24 . Recently, RMA has also been ap¬ 
plied to biomolecular system s 25 ' 26 . The relaxation modes 
and rates are given as left eigenfunctions and eigenvalues 
of the time-evolution operator of the master equation of 
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the system, respectively^ 13 . The equilibrium time cor¬ 
relation functions of the relaxation modes, {X p }, satisfy 

(Xp(t)X q ( 0)) = 5 p ^ q e p , (1) 

where (X p (t)X q ( 0)} denotes the equilibrium correlation 
of X p at time t and X q at time 0, and A p represents the 
relaxation rate of X p . 

In RMA, we calculate the approximate relaxation 
modes and rates from a simulation. RMA is formulated 
as a variational problem equivalent to the eigenvalue 
problem of the time-evolution operator of the system, 
where a relaxation mode X p is approximated by a trial 
function, which is constructed as a linear combination 
of relevant physical quantities that are time-evolved for 
to/2. The parameter to is introduced in order to reduce 
the relative weight of the faster modes contained in the 
physical quantities. The optimization of the normalized 
equilibrium autocorrelation function of the trial function 
time-displaced by r leads to the generalized eigenvalue 
problem. In practice, the time correlation matrices of 
structural fluctuations for two different times (to and 
t o + t) are calculated through simulations. Then, by 
solving a generalized eigenvalue problem for these matri¬ 
ces, the relaxation rates and modes are obtained from the 
eigenvalues and eigenvectors, respectively. Thus, we call 
this method the RMA method using a single evolution 
time: to/2. 

Recently, RMA was applied to a Monte Carlo simula¬ 
tion for a heteropolymer peptide system with a small 
number of degrees of freedom^, and its effectiveness 
was demonstrated. In the applications of RMA to ho¬ 
mopolymer systems^ - — , the translational degrees of 
freedom are removed from the conformations of the poly¬ 
mer sampled in a simulation. However, the rotational 
degrees of freedom are retained because the slow rota¬ 
tional relaxation of the polymer is important in poly¬ 
mer physics. In contrast, for heteropolymer biomolecule 
systems, both the translational and rotational degrees 
of freedom are removed from the sampled conformations 
of a biomolecule to allow for investigation of the struc¬ 
tural fluctuations around its average conformation. In 
our previous report^, we explained how to treat such 
sampled conformations with RMA, and succeeded in ap¬ 
plying RMA to a heteropolymer system. Moreover, we 
demonstrated the effectiveness of RMA to investigate the 
transitions between some local minimum-energy states by 
calculating the free-energy surfaces for relaxation modes. 

Although RMA is a powerful method for extracting 
slow modes, it requires a relatively high level of statisti¬ 
cal precision of the time correlation matrices. Owing to 
this requirement, RMA requires a long simulation where 
many transitions between local minimum-energy states 
occur, which may not be a problem because such very 
long simulations have become feasible in recent years, 
as mentioned above. Moreover, RMA cannot handle a 
large number of degrees of freedom directly. Therefore, 
reduction of the degrees of freedom is necessary for ap¬ 
plication of RMA to protein systems. To achieve this, 


we proposed a new analysis method, which is referred 
to as the principal component relaxation mode analy¬ 
sis (PCRMA) method. In this method, PCA is carried 
out first and then RMA is applied to a small number 
of principal components with large fluctuations. This 
method can systematically reduce the degrees of free¬ 
dom. We also proposed the RMA method using multi¬ 
ple evolution times, which can reduce the contribution 
of the fast modes efficiently. In our previous report^, 
we explained the combination of two proposed methods 
involving the PCRMA method using multiple evolution 
times, and demonstrated its applicability to an all-atom 
MD simulation of human lysozyme in aqueous solution. 

Recently, methods to analyze the dynamics and kinet¬ 
ics of protein simulations have been developed. In par¬ 
ticular, the Markov state model has been developed (see 
Refs . I28l - l33l and reviews^ - — and references cited therein) 
and used for many protein systems. The Markov state 
model can analyze transitions between local minimum- 
energy states, which are identified from clustering anal¬ 
ysis methods. This is a powerful method for analyzing 
dynamics in the context of both long and short simu¬ 
lations of proteins. As mentioned above, in RMA, the 
relaxation modes and rates are given as left eigenfunc¬ 
tions and eigenvalues of the time-evolution operator of 
the master equation of the system, respectively^ 13 (see 
Appendix A). From this point of view, RMA is related 
to Markov state models fsee IIID1 Appendices A and B, 
and Ref. [38h . 

An analysis method of protein dynamics based on MD 
simulation was proposed^! that separates linearly super¬ 
imposed statistically independent signals by using time 
correlation functions, and was thus designated as time- 
structure based independent component analysis (tICA). 
The method is closely related to RMA, in that it deter¬ 
mines statistically independent modes by solving the gen¬ 
eralized eigenvalue problem with to = 0. Recently, the 
combination method of tICA and a Markov state model 
was also propose d 38 ' 39 . In Ref. HH a Markov state model 
was constructed from clustering in the subspace deter¬ 
mined by tICA, which was calculated using a method 
similar to PCRMA. 

In this study, we applied PCA and RMA to analyze the 
folding trajectories of a peptide, chignolin, near its fold¬ 
ing transition temperature. As mentioned above, PCA 
cannot extract order parameters to easily identify the lo¬ 
cal minimum-energy states for a system with large con¬ 
formational changes. This is because PCA extracts the 
modes with large structural fluctuations around an av¬ 
erage structure, and the PC modes with large struc¬ 
tural fluctuations do not correspond to the transitions 
between the local minimum-energy states. On the other 
hand, RMA can extract slow relaxation modes. The local 
minimum-energy states are usually stable and the system 
thus remains in this state for a long time during a simula¬ 
tion. The order parameters with slow relaxation may cor¬ 
respond to the directions between local minimum-energy 
states. Thus, slow relaxation modes may be suitable or- 
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der parameters to identify local minimum-energy states 
and the transitions between them. 

To test this model, we performed a simulation of chig- 
nolin near its transition temperature. Chignolin consists 
of a 10 anrino-acid polypeptide that adopts a /3-hairpin 
turn structure, and is the smallest artificial protein-^, 
which has been used for testing new simulation algo¬ 
rithms. Indeed, there are many simulations for chignolin 
reported to date^i - — . Previous research has shown that 
chignolin has a stable state (native structure) and a mis- 
folded state. The native and misfolded structures are 
hairpin-like structures (see Fig. They have a com¬ 
mon turn structure from Asp3 to Glu5 and have slightly 
different hydrogen bond patterns. Furthermore, the sta¬ 
bility of the two states, i.e., how often these two states 
are observed during a simulation, has been found to de¬ 
pend on the force fielct 44 . We considered that by in¬ 
creasing temperature, many transitions between the two 
states may occur within a several hundred-nanosecond 
simulation. Thus, we performed the simulation near the 
protein’s transition temperature, which corresponds to a 
long simulation at room temperature. For estimation of 
the transition temperature, we refer to the results at a 
pressure of 1 atm obtained by the simulations in Ref. l45l . 
We calculated the free-energy surfaces of the coordinates 
of PCA and RMA and show the effectiveness of RMA for 
extracting order parameters of the protein system with 
large conformational changes. 

We also explain the relationship between RMA and 
Markov state models and propose a new analysis method 
called Markov state relaxation mode analysis (MSRMA). 
We evaluate the relaxation times of the states, which 
are defined by the free-energy surface from RMA, us¬ 
ing a simple Markov state model and the new MSRMA 
method. 


II. METHODS 


A. Principal component analysis 


PCA is a well-known method for analyzing the static 
properties of protein structural fluctuations obtained in 
a protein simulation system— - —. 

We now consider a biopolymer composed of N atoms. 
We assume that R is a 3-/V-dimensional column vector 
that consists of a set of coordinates of atoms relative to 
their average coordinates. Namely, 


R T = (r[ , r ' 2 ,..., r' N ) = (x' 1 ,y' 1 ,z' 1 ,... ,x' N ,y' N ,z' N ) 

( 2 ) 

with 


r 'i = r i- (n) , (3) 

where r, is the coordinate of the ith atom of the biopoly¬ 
mer and {ri) is its average coordinate. 

In PCA, the eigenvalue problem is solved as 

CF n = A n F n with F^F n = 5 m , n . (4) 


Here, C = (RR T ) is the 3N x 3N variance-covariance 
matrix: C = (C) j) with C*j = ( R+Rj). We now set the 
indices of the eigenvalues so that the relation Ai > A 2 > 
• • • > A 3 jv holds. The eigenvector F n with the eigenvalue 
A n is referred to as the ?rth principal component axis. 
Note that A 3 N -5 = A^n-a = ■ ■ ■ = A 3 N = 0 because 
the translational and rotational degrees of freedom are 
removed. The coordinate R can be expanded in terms of 
the PCA eigenvectors: 

3N-6 

R = ^2 ®™F n with = F^R. (5) 

71=1 

Here, <J> n is called the nth principal component. The 
variance of $> n is given by A n . 


B. Relaxation mode analysis 

The RMA method was developed to identify slow mo¬ 
tions in random spin systems and homopolymer systems. 
In RMA, we consider the following physical quantities: 

(</>n(t)0m(0)) = S n:m e~ Xnt , (6) 

where cj) n and X n are the relaxation mode and its re¬ 
laxation rate, respectively. Here, (A(t)B{ 0)) denotes the 
equilibrium time correlation function of physical quantity 
A at time t and B at time 0. 

In the following, we briefly explain how to use RMA to 
extract slow relaxation modes and their relaxation rates 
from an MD simulation that satisfies the detailed bal¬ 
ance condition. Here, we are only dealing with position 
coordinates because the relaxation of velocities (on a pi¬ 
cosecond time scale) is usually faster than that of slow 
collective modes for coordinates (on a nanosecond time 
scale). In this case, the calculation process is the same 
as that for Monte Carlo simulations (see Ref. f25h . Note 
that we describe the theoretical background of RMA in 
Appendix A and the application of the RMA method for 
an MD simulation in detail in Appendix B. 

We use the following function as an approximate re¬ 
laxation mode: 

= f p T e~ rU °/ 2 R (7) 

Here, R is given by Eq. © and r is the time-evolution 
operator of the master equation of the system (see Ap¬ 
pendices A and B). The operator e _r ^°/ 2 is used to re¬ 
duce the contributions of faster modes in R, and the trial 
function becomes a better approximation as to becomes 
larger—. Note that Eq. dTJ) is the same as Eq. (30) of 
Ref. [ 25 ]. X. p is a trial function, which is constructed as a 
linear combination of the expectation value of R after a 
period to/2. 

For the trial functions, the variational problem is 
equivalent to a generalized eigenvalue problem 

C(t 0 + T)f p = e- x ^C(t 0 )f p , 


( 8 ) 


4 


where A p is the relaxation rate corresponding to X p . The 
matrix C{t) is defined by 

C(t) = (R(t)R T ( 0)). (9) 

The orthonormal condition for X p is written as 

fp C(to)fq = 5 p ,q- ( 10 ) 

Note that the number of meaningful relaxation modes is 
3 N — 6 because the translational and rotational degrees 
of freedom are removed^. 

The time correlation functions of the approximate re¬ 
laxation modes obtained can thus be written as 

(X p (t)Xg( 0)) ~ S P:q e p . (11) 

This relation holds exactly for t = 0 and t = r and is 
expected to hold approximately for other values of t > 0. 

From the orthonormal condition, the inverse transfor¬ 
mation is derived as 

3N-6 

e-r Uo/2 R = J2 9 P X P1 (12) 

P= 1 

where 

g p = C(t 0 )f P . (13) 

The time correlation functions of R are given as 
C(t) = (R(t)R T m 

3N—6 3N—6 

= J2 9 p g v <• x p~ t °) x '?( 0 )) 

P= 1 9=1 

3N-6 

- 9 P g p e~ Xpit ~ to) 

p= 1 
3N—6 

= 9p9p e ~ Xpt ( 14 ) 

P= 1 

for t > to. Here, 

g p = e Xpt °' 2 g p . (15) 

Equations m and q lead to the relaxation mode ex¬ 
pansion of R: 

3N-6 

R ^ J2 g P X P ■ ( 16 ) 

p=i 

Because we are dealing with position coordinates only, 
C(t) is a symmetric matrix owing to the detailed bal¬ 
ance conditional, and if to is sufficiently large, the relax¬ 
ation rates X p are real and positive, which correspond to 
pure relaxation. Then, we set the indices of X p so that 
0 < Ai < A 2 < ■ • ■ holds. On the basis of the approxi¬ 
mate relaxation modes and rates, the correlation matrix 


C(t) with t > to can be calculated. The accuracy of the 
present estimation of the relaxation modes and rates can 
be examined through comparison of the correlation ma¬ 
trices C(t) calculated by the present method with those 
estimated directly by other means. In general, the time- 
displaced autocorrelations C l: i(t) are compared to check 
the obtained relaxation modes and rates. 


C. Calculation of the free-energy surface 

In PCA, from the probability density P(<bp, <f> 9 ) of 
and 4*^, the dimensionless free energy, which is the free 
energy divided by fce T, along the pth and qth principal 
component axes is calculated as 

F($p,$,) = -lnP(4>p,$ 9 ). (17) 

Here, fce and T denote the Boltzmann constant and the 
temperature of the system, respectively. In the follow¬ 
ing, we abbreviate “dimensionless free energy” as “free 
energy” for simplicity. The indices p and q are usually 
set to numbers corresponding to large eigenvalues (e.g., 
p = 1 and q = 2). 

In RMA, the quantity Y p , corresponding to in PCA, 
is defined by 

Y p = X p \g p \. (18) 

Then, the free-energy surface as a function of Y p and Y q 
is calculated as 

F(Yp,Y q ) = -\nP(Y p ,Y q ), (19) 

where P{Y p , Y q ) denotes the probability density of Yp and 
Y q . Here, X p is calculated from R as follows. Because 
of Eqs. itlDt and (fl3l) . f p g q = S Piq holds, which leads 
to f p g q = e Xpto/ 2 5 p ^ q . Therefore, by multiplying fj to 
both sides of Eq. ([TO , X p is given as a function of R as 


A p = fjR 

(20) 

— e -Apt 0 /2 

(21) 


D. Markov State Relaxation Mode Analysis 

In this subsection, we consider the relation between 
RMA and Markov state models, and propose the new 
method of MSRMA. In the simplest Markov state model, 
the phase space of the system, where only the posi¬ 
tion coordinates are considered, is divided into clusters 
(subsets) Si, i = 1 First, the joint probability 

Pi,j(r) = P(Q G Si,r\Q G Sj, 0) that the state of the 
system Q is in the j th cluster at time 0 and is in the ith 
cluster at time r > 0 is calculated in a simulation. Sec¬ 
ond, the transition probability Ti^(r) that the state of 
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the system is found in the *th cluster after time r starting 
from a state in the jth cluster is calculated by 


the conditional probability density T t (Q\Q') defined by 
(1A13I) . leads to a generalized eigenvalue problem 


TiAr)=PiAr)/Pj, ( 22 ) 


C(t 0 + r)f p =e-^ T C(t 0 )f P (28) 


where pj = P(Q G Sj) is the probability that the state 
of the system is found in the jth cluster, which is esti¬ 
mated in the simulation. Then, by solving the eigenvalue 
problem 

/ p t T(t) = fp T A p (23) 


for the transition matrix T(r) = (T)j(r)), the ptlr eigen¬ 
vector fp and its eigenvalue A p are obtained. The eigen¬ 
vector fi oc (1,1,...,1) T corresponds to the equilibrium 
state and its eigenvalue Ai = 1. Other eigenvectors 
f p represent structural transitions and the correspond¬ 
ing eigenvalues A p give their relaxational timescales f p 
as 


r 

In A p 


(24) 


Note that in the Markov description, it is important 
that the states are defined in a kinetically meaningful 
wa y 28 ’ 38 . We need to define the states that are classified 
by order parameters representing the dynamics and ki¬ 
netics of the system. Even with a good choice of states, in 
order for a Markov description of the process to be accu¬ 
rate, the time interval r should also be chosen carefully. 
In other words, for a Markov description to work, the 
time interval of the transition matrix r must be chosen 
appropriately so that it is as large as the slowest relax¬ 
ation time of the states. When plotting f p as a function 
of r, ? v slowly converges to the appropriate time scale 
when t is increased. In addition, when a much longer 
r than the slowest relaxation time of the states is used, 
the Markov state model is not expected to be accurate. 
Thus, we usually set the time interval r to the value when 
the variation of r p is sufficiently fla t 28 ’ 38 . 

The above-mentioned procedure of the Markov state 
model is related to the following procedure of RMA. We 
call the following procedure the MSRMA. Similar to the 
description above in Section Hi B1 we consider an approx¬ 
imate relaxation mode given by 

A'p = /p T e- rt ‘°/ 2 A, (25) 


where A, as a function of the state Q of the system, is 
defined by 

A(Q) = (di(Q),d 2 (Q),...A(Q)) T (26) 


with 


Si(Q) 


1 for Q G Si, 
0 for Q £ Si. 


(27) 


The operator e~ rt *°/ 2 reduces the contributions of faster 
modes in A. Then, the variational problem, which is 
equivalent to the eigenvalue problems (IB 1 1) and (IB2I) for 


or 

f P T C(t 0 + r) = e~^ T f p T C(t 0 ) (29) 

with 

fp T C(t 0 )f q = S p , q , (30) 

where 

C(t) = (A(f)A T (0)). (31) 

Because C(t) = C(t ) T , Eqs. (1551) and (1551) are equiv¬ 
alent. It follows from the definition of A that the 
(i,j) component of C(t) is the joint probability Pij(t): 

c(t) = (A At)). 

If we set to = 0) the generalized eigenvalue problem 
(1551) becomes the eigenvalue problem (1551) with A p = 
e -\ P r or ^ _ i/A £ , because (7(0) = diag(pi,... ,p n ) 
and C(t)C(0 )~ 1 = T(r). Thus, the Markov state model 
is a special case of MSRMA with to = 0. Because the 
operator e _rtt °/ 2 in Eq. (ESD reduces the contributions 
of faster modes in A, the solutions of the generalized 
eigenvalue problem (051) or (051) become better approx¬ 
imations to the slow relaxation modes and rates as to 
becomes larger. Therefore, the relaxation times f p ob¬ 
tained by the Markov state model are expected to be 
improved by solving Eqs. 051) or 051) with to > 0 rather 
than Eq. 051) . 


III. COMPUTATIONAL DETAILS 

An MD simulation was performed with the AMBER 
package (AMBER 11.0) with GPGPU using the ff99SB 
force field and TIP3P models. Chignolin consists of a 
10-amino acid sequence: GYDPETGTWG. We gener¬ 
ated an extended structure of chignolin using the leap 
command and solvated it with a 15 A buffer of TIP3P 
water around the peptide in each direction. The num¬ 
bers of atoms of chignolin and water molecules are 138 
and 10,941 (3647 water molecules), respectively. Two 
potassium ions (Na + ) are included in the system, result¬ 
ing in a net-neutral system. The total number of atoms 
in the system is 11,081. After energy minimization and 
heating, equilibration at a constant pressure (1 atm) and 
450 K, a 50-ns MD simulation, was performed. Finally, a 
750-ns MD simulation was performed following the equi¬ 
libration at 1 atm and 450 K. We used a time step of 
2 fs. The Langevin thermostat with a friction constant 
7 = 2.0 ps _1 was used for temperature control. The cut¬ 
off is 8 A, which was used to limit the direct space sum 
for the Particle Mesh Ewald (PME) method of AMBER. 
For the equilibration and production run, pmemd with 
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GPGPU for MD simulations was used. For analysis, the 
coordinates were saved every 10 ps. The number of sam¬ 
ples was 750,000. 

We used the coordinates of Cq, atoms on the backbone 
as coordinates, so that the degrees of freedom was 30. 
After removing the translational and rotational motions 
from the coordinates of C a atom s 47 ’ 48 , PCA and RMA 
were carried out on the coordinates of C a atoms. For 
RMA, we set t 0 and r to 10.0 ps and 20.0 ps, respectively. 

IV. RESULTS AND DISCUSSION 

A. Simulation Results 

We performed a 750-ns MD simulation of chignolin in 
aqueous solution at 450 K. Before the present simulation, 
we performed two several-ys simulations of chignolin at 
300 K. In one simulation, chignolin folded to the native 
structure (see Fig. H a )) while in the other simulation, 
it folded to the misfolded structure (see Fig. HKb)). Af¬ 
ter folding to the native or misfolded structures once, 
these structures were maintained, with few transitions 
between the structures, because the simulation times 
were not sufficiently long. To allow for large conforma¬ 
tional changes between the structures to occur frequently 
and to be able to observe numerous transitions during a 
several hundred-ns simulation, we performed the simula¬ 
tion close to the transition temperature. 

The time series of the RMSD of C a atoms from the 
native structure is shown in Fig. [I] Here, the native 
structure is the first coordinate of lUAO.pdb. RMSD 
is calculated after fitting the obtained structures to the 
native structure. Many transitions among the native-like 
structures (RMSD « 1 A), misfolded structures (RMSD 
«2 A), and unfolded structures (RMSD «5 A) occurred 
during the simulation. 

This system is suitable for testing the effectiveness of 
RMA, because of the many transitions observed among 
structures, including with the completely unfolded struc¬ 
tures. The present simulation conducted at 450 K corre¬ 
sponds to a long simulation conducted at 300 K, where 
many transitions are observed, similar to the observations 
in Ref. [3. We next analyzed the dynamics of chignolin 
in the system using PCA, RMA, and the newly proposed 
combined method MSRMA. 


B. Results of principal component analysis 

After a 750-ns production run was performed and the 
translational and rotational motions were removed from 
the coordinates of C a atoms — we applied PCA to the 
system. The five largest eigenvalues obtained by PCA are 
listed in Table I. The pth eigenvalue is the variance of the 
pth principal component. The first PC mode mainly con¬ 
tributes to the global conformational fluctuation around 
the average structure. Figures Ha) and Hb) show the 


free energy surfaces along the first and second PC axes 
($i vs. $ 2 ), and the second and third PC axes ($2 vs. 
$ 3 ), respectively, given by Eq. m- We also show the 
free-energy surface of $1 and $3 in Fig. H c )> because 
the two modes have a significantly slower relaxation than 
the others (see Fig. Q] and its discussion). The relations 
between RMSD from the native structure and PC com¬ 
ponents ((a) $ 1 , (b) $ 2 , or (c) H> 3 ) are shown in Fig. [3] 
In Fig. H a )> a local minimum-energy state can be ob¬ 
served near the value of <f>i ~ —4. As the values of $1 
become large, the values of RMSD also become large. 
These results indicate that the direction of the first PC 
mode corresponds to the transition between the compact 
and unfolded structures. From the free-energy surface 
of $1 and <I> 2 , we could not classify the native and mis¬ 
folded states because the conformational difference be¬ 
tween them is small compared with the conformational 
fluctuations of the system. As shown in Fig. Hb), the 
direction of the second PC mode corresponds to a large 
fluctuation around slightly compact structures (RMSD 
« 4), which does not correspond to the native and mis¬ 
folded structures. Because of the two local minimum- 
energy states observed in Figs. Hb) and H c ) along the 
third PC axis, the direction of the third PC mode cor¬ 
responds to the transition between two minimum states, 
which correspond to the native and misfolded states, even 
though there are many overlapping unfolded structures 
(see also Fig. [10]). The free-energy surface of $1 and $3 
can classify not only the native and misfolded states but 
also the unfolded state. It is suggested that it is more 
effective to use PC modes with slower relaxation rather 
than those with larger conformational fluctuations as the 
axes of the free-energy surface. However, PCA does not 
provide time information, and the native and misfolded 
states still show slight overlap on the free-energy surface 
of $1 and $3 (see Fig. [TUl'c') and its discussion). 

Therefore, we next calculated the time-displaced auto¬ 
correlation functions of $ 1 , $ 2 , and $ 3 , which are shown 
in Fig. [4] The first and third PC modes (in red and blue, 
respectively) show slow relaxation, while the second PC 
mode (in green) shows relatively faster relaxation. The 
second PC mode corresponds to large fluctuation of the 
slightly compact structures (RMSD « 4), which are not 
completely compact like the native structure. Therefore, 
PCA could extract the large conformational fluctuation 
around the slightly compact structure as the second PC 
mode, even though its relaxation is faster. 


C. Results of relaxation mode analysis 

We applied RMA to the same coordinate data analyzed 
with PCA described above. In Supplemental Figure SI—, 
the time-displaced autocorrelation functions of the x, y, 
^-coordinates for the «th G a atom obtained by the simu¬ 
lation directly and reproduced by RMA from Eq. (TUO) are 
compared. These functions showed good agreement over¬ 
all. This means that the appropriate relaxation modes 
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TABLE I. The eigenvalues A p of the first to fifth PC modes, 
and their relaxation times t p (= 1/A p ) and conformational 
fluctuations g p of the first to fifth slowest relaxation modes. 



PCA 

RMA 

V 


t p (ps) 

(A 2 ) 

1 

20.17 

3.7 xlO 3 

6.93 

2 

7.01 

2.4 xlO 3 

10.81 

3 

4.53 

1.5 xlO 3 

1.85 

4 

2.58 

7.5 xlO 2 

2.87 

5 

1.60 

6.4 xlO 2 

1.74 


and times were obtained. 

Table I shows the relaxation times and conformational 
fluctuations of the five slowest relaxation modes (RMs). 
The three slowest relaxation modes were slower than the 
other modes. The second slowest relaxation mode showed 
the largest conformational fluctuation. Figure Et a) and 
EJb) show the free-energy surfaces along the first and sec¬ 
ond slowest RM axes (Y-i vs. Y 2 ) and the second and third 
slowest RM axes (Y 2 vs. Y 3 ), respectively. In Fig. El the 
relations between the RMSD from the native structure 
and RM components (Yi(a), y 2 (b), or Y 3 (c)) are shown. 
In Fig.[5fa), there are four characteristic regions. Two of 
them correspond to the local minimum-energy states of 
the native and misfolded structures. Random structures 
are located at the larger values of Y 2 and I 3 . There is 
a free-energy minimum near the origin of Fig. [5ja). We 
refer to the state of the minimum as the intermediate 
state. Figures Ela) andE^a) indicate that the direction 
of the first RM corresponds to the transition between the 
native and misfolded states. Figures EI a ) andElb) indi¬ 
cate that the direction of the second RM corresponds to 
the transition between the misfolded state and the inter¬ 
mediate state. Furthermore, Figs.EIb) and EIc) suggest 
that the direction of the third RM may correspond to the 
transition between compact structures and the unfolded 
state. 

Note that the normalized {g p } are not orthogonal to 
each other, in contrast to {.F p }. However, if the local 
minimum-energy states tend to be located in parallel 
along the slow RM axis, the direction of the axis corre¬ 
sponds to that for the transition between local minimum- 
energy states. In the previous work of Ref. HH, low free- 
energy paths, which connect to local minimum-energy 
states, were observed along the slow RM axes. The same 
tendency was observed in the present study. Fig. El shows 
the time-displaced autocorrelation functions of the pth 
RM component Y p . The relaxation of ( Y p (t)Y p (0 )) be¬ 
comes gradually faster as p becomes larger. Therefore, 
we succeeded in obtaining the order parameters with slow 
relaxation. 

The suitable order parameters to identify the native 
and misfolded structures in the chignolin system have 
been identified in previous work, as follows: the distance 
between the amide nitrogen atom of Asp3 (Asp3N) and 


the carbonyl oxygen atom of Gly7 (Gly70), D(Asp3N- 
Gly70), and that between Asp3N and the carbonyl oxy¬ 
gen atom of Thr 8 (Thr80), D(Asp3N-Thr80)^i. These 
order parameters could clearly distinguish between the 
native and misfolded states. Fig. [ 8 ] shows the free-energy 
surface along D(Asp3N-Gly70) and D(Asp3N-Thr80) 
obtained by the present simulation. The shape of the 
free-energy surface is similar to that obtained by Refs. 
lill and Ej. Although these order parameters are pow¬ 
erful for the system of chignolin, we generally need to 
search for good order parameters that will depend on the 
system. From RMA, we obtained good order parameters 
automatically not only to identify the local minimum- 
energy states but also to investigate the transitions be¬ 
tween them. 

In addition, we obtained interesting results for the in¬ 
termediate state from RMA. The structures extracted 
for the four regions are listed in Table II, which corre¬ 
spond to the native, misfolded, intermediate, and un¬ 
folded states. The numbers of structures extracted for 
the native, misfolded, intermediate, and unfolded states 
are 24,824, 14,571, 5251, and 1806, respectively, and 
the total number of samples was 75,000. Fig. [9] shows 
movies of the structures in the four clusters (Multimedia 
view). We extracted every 100 structures for the native 
and misfolded states, every 20 structures for the inter¬ 
mediate state, and every 10 structures for the unfolded 
state to reduce the file size. Each movie shows only a 
set of structures and not a time sequence; thus, the or¬ 
der of the structures has no meaning. The native and 
misfolded structures appear as hairpin-like structures. 
The native state is stabilized by four hydrogen bonds: 

(a) between the oxygen atom of the side-chain carboxyl 
group of Asp3 (Asp30i) and the amide nitrogen atom 
of Thr 6 (Thr 6 N), (b) between Asp30^ and the oxygen 
atom of the side-chain of Thr 6 (Thr60 7 ), (c) between 
Asp3N and Thr80, and (d) between the carobonyl oxy¬ 
gen atom of Glyl (GlylO) and the amide nitrogen atom 
of G1Y10 (GlylON). In addition, the side chains of Trp9 
and Tyr2 are generally located on the same side. On 
the other hand, the misfolded state is stabilized by five 
hydrogen bonds: (a) between Asp30^ and Thr 6 N, (b) 
between Asp30^ and Thr60 7 , (c) between the carbonyl 
oxygen atom of Asp3 (Asp30) and the amide nitrogen 
atom of Gly7 (Gly7N), (d) between Asp3N and Gly70, 
and (e) between GlylO and the amide nitrogen atom of 
Gly9 (Gly9N). In addition, the side chains of Trp9 and 
Tyr2 are generally located on opposite sides. The termi¬ 
nal residues show large fluctuations. The misfolded state 
has different hydrogen bond patterns between Asp3N and 
Gly70 and between GlylO and Gly9N. The obtained 
native and misfolded structures were similar to those ob¬ 
tained in previous studies^iGS. 

In the intermediate state, chignolin tends to form a 
turn from Asp3 to Glu5, similar to the native and mis¬ 
folded states. The intermediate state is stabilized by two 
hydrogen bonds: (a) between Asp30^ and Thr 6 N, and 

(b) between Asp30,s and Thr60 7 . These hydrogen bonds 











are also formed in the native and misfolded states. Table 
III shows the average RMSD values of residues from Asp3 
to Glu5 and from Tyr2 to Trp9 for the four clusters. Here, 
we calculated the values using Visual Molecular Dynam¬ 
ics (VMD)Sa. The average RMSD values of residues from 
Asp3 to Glu5 of the native, misfolded, and intermediate 
structures were all small. These results indicate that the 
intermediate state has the characteristic structure, i.e., 
the turn structure from Asp3 to Glu5. 

In Fig. S3^, we show the Ramachandran plots of each 
residue from Tyr2 to Trp9 for the four clusters. The plots 
of each residue from Tyr2 to Thr 6 for the native and mis¬ 
folded states are similar to each other. The difference of 
the backbone dihedral angles of Gly7 causes the different 
hydrogen bond patterns observed between the native and 
misfolded states. The plots of each residue from Asp3 to 
Glu5 for the intermediate state are similar to those for 
the native and misfolded states, indicating the forma¬ 
tion of a turn structure. These results demonstrate that 
the native, misfolded, and intermediate structures have 
the same turn structure. The main difference between 
the unfolded state and the other states is in the distribu¬ 
tion of the dihedral angles of Pro4, as shown in Fig. S2^. 
This difference is responsible for the large RMSD value of 
residues from Asp3 to Glu5 of the unfolded state. Based 
on the structures of the four states obtained by RMA, 
we suggest that the dihedral angles are also good order 
parameters to classify the states in this system. 

A previous report ^ 2 of a long simulation suggested 
that chignolin folds to the native or misfolded structures 
through the turn structure present in the intermediate 
state. The authors checked for the turn formation along 
their trajectories. Fig. S3^ shows the RMSD values from 
the native structure for the four states defined in Table 
III as a function of time. We observed the intermediate 
state in transition not only between the native and un¬ 
folded states but also between the native and misfolded 
states. Therefore, the slow modes extracted by RMA 
clarify the conformational transitions in the folding and 
unfolding processes. 

Fig. [TO] shows the distributions for the native, mis¬ 
folded, and intermediate states classified on the free- 
energy surfaces obtained from RMA in the planes of (a) 
d>i and $ 2 , (b) <k 2 and $ 3 , and (c) $1 and $ 3 . Although 
the free energy of $1 and $ 2 cannot distinguish between 
the native and misfolded states, that of <h 2 and $3 could. 
As shown in Fig. [TOl' c'l. there are still overlaps between 
the native and misfolded states. The structures in the in¬ 
termediate state are distributed widely on the free-energy 
surface obtained by PCA and overlap on the areas of the 
folded and misfolded states corresponding to the free- 
energy surface of $ 2 and <J> 3 . PCA could not identify 
the intermediate state, because the structural fluctua¬ 
tion near the terminal residues of the intermediate state 
is large and PCA extracts large structural fluctuations. 
RMA could extract the characteristic structure, i.e., the 
turn structure from Asp3 to Glu5, because the large fluc¬ 
tuations of the terminal residues have fast relaxation, and 


RMA neglects the faster conformational changes. 

These results strongly support the utility in cluster¬ 
ing structures using the free energy obtained from RMA. 
We next investigated the transitions between the four 
clusters in detail. Fig. [TlT a') shows the free energy sur¬ 
face of Yj and Yj with more contour lines. The free en¬ 
ergy differences between the intermediate and the other 
three states are shown schematically in Fig. ITTT bh The 
free energy difference from the intermediate state to the 
unfolded state was calculated to be approximately 2.1 
kcal/mol. The free energy barrier seems to be small from 
the unfolded state to the transition state between the 
unfolded and intermediate states. Furthermore, given 
that the free energy difference from the intermediate 
state to the transition state toward the native state (0.5 
kcal/mol) is lower than that toward the misfolded state 
(0.9 kcal/mol), the transition from the intermediate state 
to the native state appears to be easier than that to the 
misfolded state. These results suggest that the main 
folding path of chignolin from the unfolded to the na¬ 
tive states passes through the intermediate state. The 
free energy difference from the native or misfolded states 
to their transition states toward the intermediate state 
were both high (2.2 or 2.0 kcal/mol, respectively). One 
reason could be that the hydrogen bonds between back¬ 
bone atoms need to be broken in order to change from 
the native or misfolded states to the intermediate state. 
The slowest relaxation process is mostly related to the 
transition between the native and misfolded states; it 
corresponds to escape from the native state and the mis¬ 
folded state to intermediate states. The second slowest 
relaxation process seems to be related to the transition 
between misfolded and unfolded states; it corresponds to 
escape from the misfolded state to the intermediate state 
and from the intermediate state to the unfolded state. 


D. Results of Markov state relaxation mode analysis 

Recently, Markov state models have been constructed 
in the discrete state space defined by the clustering in 
subspace determined by tIC A 38 i 39 . In the present work, 
we classified the structures into a smaller number of 
states by using the free-energy surface obtained by RMA, 
and then applied the Markov state model and MSRMA 
to analyze the states. 

We divided the free-energy surface of Yj and 1 2 to 
the four regions shown in Fig. [[Ha), and classified the 
structures into the following four states: native (Sj), mis¬ 
folded (£ 2 ), intermediate (£ 3 ), and unfolded (£ 4 ) states. 
After calculating the trajectories of 5i (i =1,- • • ,4), we 
solved the generalized eigenvalue problem of Eq. (1281) . 
Because C(t) is a symmetric matrix, C(t) = C(t) , we 
used \{C(t) + C(t ) T ) instead of C(t). 

Fig. [T21 shows the relaxation times f p = 1 /A p obtained 
by MSRMA as a function of r when f 0 =0, 10, 50, 100, 
200, and 500 ps. Because the first eigenvector / 1 , pro- 
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portional to (1,1,1,1) T , corresponds to the steady state 
with infinite relaxation time T\ = oo, we show the second 
(a), third (b), and fourth (c) longest relaxation times in 
Fig. [12] The lines of to = 0 correspond to the results 
of a simple Markov state model. In the case of to = 0, 
f p (p =2, 3, and 4) values slowly approach the appro¬ 
priate time scale, i.e., the values for plateau regions or 
peak values of the solid lines of Fig. [T2j when r is in¬ 
creased. Using the method for determining the value of 
r described in Section IID, the appropriate time interval 
for r was determined to be around 3 ns, which corre¬ 
sponds to the time scale of the slower relaxation times 
obtained by RMA. The relaxation times at r = 3 ns for 
the second, third, and fourth slowest relaxation modes 
were approximately 5, 3, and 1.8 ns, respectively, which 
are close to but slightly higher than those obtained by 
RMA at 3.7, 2.4, and 1.7 ns, respectively. 

For the lines of t 0 > 0, the values of f p quickly ap¬ 
proach the appropriate time scale; i.e., those correspond¬ 
ing to the values for plateau regions or peak values of 
Fig. [12] Therefore, the slow relaxation times can be im¬ 
proved when applying MSRMA with to > 0, which is 
introduced in order to reduce the relative weight of the 
faster modes contained in the physical quantities given 
by Eq. (1251) . The relaxation times are improved even if 
to is small; in particular, the appropriate relaxation times 
are obtained using a shorter value for the time interval r. 
The estimated relaxation times were approximately 5.5 
ns, 3.5 ns, and 2.0 ns (using to = 0.2 ns and r = 1 ns). 

The normalized time-displaced autocorrelation func¬ 
tions of A both calculated directly and reproduced by 
MSRMA using Eq. (fT-Il) are shown in Fig. [T3] The func¬ 
tion is given by 




mo)-(m)) 2 ) 

(5 z (t)6m)-p(*y 2 

p(i) — p(i) 2 


(32) 


Note that, in the present case, the summation in Eq. 
(fTTll is taken from p =1 to 4. The results obtained di¬ 
rectly shown in Fig. [13] indicate that the time-displaced 
autocorrelation functions of i = 1 and 2, which corre¬ 
spond to the native and misfolded states, respectively, 
have similar slower relaxations. The time-displaced au¬ 
tocorrelation function of i =3 and 4, which correspond to 
the intermediate and unfolded states, respectively, have 
similar relaxations, which are slightly faster compared to 
those of i =1 and 2. 

The correlation matrix C(t) reproduced by Eq. (fill) 
must be equal to that directly calculated by the simula¬ 
tion at t = to and t = to +t, because Eq. m holds ex¬ 
actly for t = 0 and t = r, as mentioned above in Section 
EH Therefore, as shown in Fig. [131 all lines of Ci,i(t ) 
reproduced by Eq. (fill) go through the points of directly 
calculated Ci i(t) at t = to and t = to + t. In the case 
of to = 0 ps and r = 1000 ps, shown in Fig. [13] (a), 
reproduced Ci^it) is larger than the directly calculated 


for t 0 = 0 < t < t 0 + t = r, and is smaller for 
to + t = t < t. As a result, the relaxation times ob¬ 
tained by MSRMA are underestimated. This is due to 
the above-mentioned restriction and the existence of fast 
relaxation modes in the directly calculated which 

causes the fast initial decay of Ci^{t). These fast relax¬ 
ation modes cannot be described by the three relaxation 
modes used in MSRMA. Thus, the results of MSRMA 
with to = 0 are improved by using a longer r as shown 
in Fig. nan which is the usual method for the sim¬ 
ple Markov state model. Comparison of the results of 
MSRMA for to = 10 ps and r = 1000 ps shown in Fig. 
me) with those for to = 0 ps and r = 1000 ps shown in 
Fig- rnil a ). and comparison of those for to = 10 ps and 
r = 3000 ps shown in Fig. [Idl' d) with those for t 0 = 0 ps 
and r = 3000 ps shown in Fig. [TsT bl make it clear that 
the results of MSRMA are improved by using finite to, 
even if to is small. Fig. [T2] clearly shows that the relax¬ 
ation times obtained for to = 10 ps and r = 1000 ps are 
similar to those obtained for to = 0 ps and r = 3000 ps. 
The results for t 0 = 200 ps and r = 100 ps shown in Fig. 
me) and those for to = 500 ps and r = 100 ps shown in 
Fig.mf) demonstrate that a good description of the re¬ 
laxation of the states can be obtained by using MSRMA 
with finite to, even if r is small. Note that the values 
of to and t 0 + t [200 ps and 300 ps for Fig. fT3l el. and 
500 ps and 600 ps for Fig. fldl' e)] are much smaller than 
the value of r (3000 ps) used to estimate the relaxation 
times by MSRMA with to = 0, i.e., the simple Markov 
state model. 

TABLE II. The definitions of four clusters: native state (Na¬ 
tive), misfolded state (Misfolded), intermediate state (Inter¬ 
mediate), and unfolded state (Unfolded). The total number 
of samples in the four regions is 46,452, and the total num¬ 
ber of samples overall is 75,000. The structures in the ranges 
from min to max of Yi and Yi (for native, misfolded, and 
intermediate states) or Yi and Y 3 (for unfolded state) were 
extracted. 


Cluster 

Native 

Misfolded 

Intermediate 

Unfolded 


min:max 

min’.max 

min:max 

min:max 

Yi 

—4.3:—1.8 

1.2:4.2 

1.2:3.7 


Yi 

—2.0:1.0 

—6.5:—2.5 

1.5:4.0 

5.0:16.0 

Yi 




4.0:10.0 

No. of samples 

24824 

14571 

5251 

1806 


V. CONCLUSIONS 

In this study, we applied RMA to extract reaction 
coordinates for a protein system characterized by large 
conformational changes such as folding/unfolding sim¬ 
ulation. We performed a 750-ns simulation of chigno- 
lin near its transition temperature and observed many 
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TABLE III. The average RMSD values (A) for the four clus¬ 
ters. The structures in all clusters were fit to the first struc¬ 
ture in the native state using backbone atoms from Asp3 to 
Glu5. Then, the average RMSD from the average structure 
of each cluster was calculated for the backbone atoms from 
Tyr2 to Trp9 or from Asp3 to Glu5. 



Tyr2 to Trp9 

Asp3 to Glu5 

Native 

1.198 

0.185 

Misfolded 

1.652 

0.212 

Intermediate 

3.671 

0.245 

Unfolded 

8.406 

0.993 


transitions between the most stable, misfolded, and un¬ 
folded states. RMA could extract the slow relaxation 
modes. The local minimum-energy states were usually 
stable and the system remained in this state for a long 
time during the simulation. The order parameters with 
slow relaxation corresponded to the directions between 
the local minimum-energy states. Therefore, these slow 
relaxation modes indicate the suitable order parameters 
for identifying the local minimum-energy states automat¬ 
ically. From the free-energy surfaces obtained by RMA, 
we identified not only the native and misfolded states but 
also the intermediate state. RMA neglected the confor¬ 
mational changes with fast relaxation and extracted the 
intermediate state showing a turn structure from Asp3 
to Glu5. Thus, using RMA clarified the folding process 
of chignolin to the native or misfolded structures through 
the turn structure. Analysis of the free energy differences 
further revealed that the transition from the folding to 
native state through the turn structure is easier than that 
to the misfolded state. 

Note that in simulations, the free-energy surface de¬ 
pends on both the temperature and simulation time in 
practice. In general, it is still difficult to determine 
whether or a not a simulation time is sufficient. The 
free-energy surface obtained in the present study resulted 
from a 750-ns simulation near the transition temperature. 
To support the validity of the free-energy surface, we also 
need to investigate the stability of the local minimum- 
energy states. In addition, the free-energy surface near 
a transition temperature is different from that at a room 
temperature. Nevertheless, understanding the dynamics 
and kinetics of chignolin near its transition temperature 
can provide important information on the folding pro¬ 
cess. Overall, these results indicated that RMA can be 
used to effectively analyze long simulations at room tem¬ 
perature and is also useful for investigating systems with 
large conformational changes, such as intrinsically disor¬ 
dered proteins and protein folding. 

Our experience of RMA suggests that a simulation 
time longer than approximately 10 r s i ow is preferable, 
where r s i ow is the slowest relaxation time. In addition, 
even if such a long simulation cannot be performed, rare 
transitions, (e.g., one transition), can be obtained during 
the simulation to identify local minimum-energy states 


and the transitions between them. 

We have here proposed a new analysis method called 
MSRMA. By applying RMA to the Markov state model, 
we introduced the evolution time to, which reduced the 
relative weight of the faster modes contained in the phys¬ 
ical quantities. From clustering the states using the 
free-energy surface obtained by RMA, we constructed a 
Markov state model and performed MSRMA. Analysis 
and comparison of the relaxation times of the states ob¬ 
tained by a simple Markov state model and MSRMA 
showed similar relaxation times to those obtained from 
RMA. Moreover, we show that MSRMA clearly improves 
the approximate relaxation times. 
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Appendix A: Relaxation modes 

We consider a Langevin equation for a biomolecule 
with N atoms: 

dll ■ f) 

m~ = -C, Vi - g^U{{rj}) + Wi (Al) 

with 


Here, r,;(f) and u,(t) denote the position and the veloc¬ 
ity of the itli atom at time t , respectively, mi is the 
mass of the ith atom and £ is the friction constant. The 
interaction between atoms is described by the potential 
U({ r i}) = U(ri,..., rjv). The random force Wi(t) acting 
on the zth atom is a Gaussian white stochastic process 
and satisfies 

{wi,a(t)wj,p(t)) = 2C k B T6 a> pS it j6(t - t'), (A3) 

where Wi^ a , fee, and T denote the a-component of Wi 
(a=x, y, or z), the Boltzmann constant, and the temper¬ 
ature of the system, respectively. The Kramers equation 
equivalent to Eqs. JAD and (IA2I) can be written as 

^P{Q,t) = -r(Q)P(Q,t) . (A4) 

Here, Q = {jt, . .., rjv, Wi, ■ ■ •, vjv} denotes a point in 
the phase space of the system, and P(Q, t)dQ denotes the 







probability that the system is found at time t in an in¬ 
finitesimal volume dQ at the point Q in the phase space. 
The time-evolution operator r is explicitly given by 


If 


r(Q) = 

vl— ( keTjr 

^{dr, 1 m, dvi dr t m, dv t \ 1 im dv t 

l—l v x 

(A5) 

Because Eqs. ED and (IA2I) are nonlinear for anharmonic 
potentials, it is generally difficult to define relaxation 
modes as normal coordinates of the equations. However, 
as explained below, we can define relaxation modes on 
the basis of Eq. (IA4I) , which is a linear equation. 

Let 4> n {Q) and i/j n {Q) denote the left and right eigen¬ 
vectors of the time-evolution operator r(Q) with an 
eigenvalue X n , respectively: 


P{Q)4>n(Q) = A n ^ n (Q) 

(A6) 

r\Q)MQ) = A nMQ)- 

(A7) 


Here, r\Q) is the adjoint operator of r{Q) defined by 

J f(Q)(r(Q)g(Q))dQ = J (. r\Q)f(Q))g(Q)dQ , 

(A8) 

which satisfies 


r(Q)5(Q - Q') = (Q')S(Q - Q’) . (A9) 

The eigenvectors are chosen to satisfy the orthonormal 
condition: 


dQ(f>n{Q')' l /jm(Q') — 3n,m ■ 


(A10) 


The equilibrium distribution function for Kramers equa¬ 
tion 


P eq {Q) oc exp 


1 

k^T 


+U ({ r j}) 


(AH) 

is a right eigenfunction of r(Q) with zero eigenvalue: 
r(Q)P eq (Q) = 0. The corresponding left eigenfunction 


is f. 

From Eq. ED, a formal solution of P(Q,t) for an 
initial condition P(Q, 0) = Pq(Q) is given by 


P(Q,t) =e~ nQ)t P 0 {Q) . (A12) 


The conditional probability T t (Q\Q')dQ that the system 
is found at time t in an infinitesimal volume dQ at Q , 
given that the system is at Q' at time 0, is then given by 

T t (Q\Q') = e~ r ( Q)t 6 (Q - Q') . (A13) 


The time-displaced correlation functions of two physical 
quantities A(Q ) and B(Q) in the equilibrium are given 


by 

{A(t)B{ 0)) = J j A{Q)T t {Q\Q')B{Q')P eq {Q')dQdQ' 
= J A(Q)e~ r ^ t B(Q)P eq (Q)dQ 
= J je -rt(Q)t A(Q)j B(Q)P eq (Q)dQ 

(A14) 

We define a quantity 4> n {Q) through 

MQ) =k(Q)Pe q (Q) ■ (A15) 

From Eq. (1AI4I) . the equilibrium time-displaced correla¬ 
tion function of 4> n (Q) and (f> m {Q ) is given by 

(<Mi)</>m(0)} = Sn, m e~ Xnt • (A16) 

If two quantities A{Q) and B(Q) are expanded as 
A(Q) =^2a n (j) n (Q) and B(Q) = Y b n ^> n (Q), (A17) 

n n 

with 

a n = j dQA(Q)(j> n (Q)P eq (Q) = {Acj> n } (A18) 

and 

b n = j dQct> n (Q)B(Q)P eq (Q) = (cf> n B), (A19) 

then the time correlation function of A and B in the 
equilibrium is given by 

{A(t)B( 0)) = Y, «nS„e- A "‘ . (A20) 

n 

Thus, in terms of (f> n (Q ) and (f> n (Q ), the correlation func¬ 
tion (A(t)B( 0)) is decomposed into a sum of exponen¬ 
tially relaxing contributions. Therefore, we use two sets 
of functions, {(frniQ)} and {4>n(Q)}, as relaxation modes 
and call {A„} their relaxation rates. Note that the eigen¬ 
values A ra are not necessarily real in general. Therefore, 
the terms e _Ant in Eq. (IA20I) can describe oscillatory 
behavior—. 

The Kramers equation satisfies the detailed balance 
condition^ 

P eq (Q')r(Q)6(Q - Q') = P eq (eQ)r\eQ)S(eQ - eQ') , 

(A21) 

and 


Peq(Q) = P eq (eQ) , 


(A22) 

where eQ denotes the time-reversed state of the state Q , 
namely eQ = {ein,..., e N r N , .ejv+iWi, ■ ■ ■, e 2 NV N } with 


1 for i = 1,..., N, 

-1 for i = N+1,...,2N. 

This leads to the relation 

(j) n {Q) oc 4 > n (eQ) 


(A23) 


(A24) 


between the relaxation modes. 
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Appendix B: Relaxation mode analysis 

The eigenvalue problem of eqs. CMt and (IA71) can be 
written as 

e~ nQ)T MQ)Pe q (Q) = e~ x " T MQ)P eq (Q) (Bl) 

and 

e~ r ' {Q)T MQ) = e~ XnT MQ) ■ (B 2 ) 

This eigenvalue problem is equivalent to the variational 
problem 

5TZ = 0 (B3) 

for 


For the trial functions Eqs. (IB5I) and (IB6I) . the varia¬ 
tional equation is equivalent to a generalized eigenvalue 
problem 


fJC(t 0 + T) = e- x ^fJC(t 0 ) 

(BH) 

C{t 0 + r)f p = e~ XpT C(t 0 )f p , 

(B12) 


where X p is the relaxation rate corresponding to X p and 
X p . The matrix C(t) is an equilibrium time-displaced 
correlation matrix defined by 

C{t) = (R(t)R T ( 0)) . (B13) 

The orthonormal condition for X p and X p is written as 


5 0n] 


(<l>n(r)$n( 0)) 

(bn 4^n ) 


(B4) 


and the stationary value of 1Z gives the eigenvalue 
exp(—A„r). In Ref. ITsl a linear combination of rele¬ 
vant quantities is used as a trial function for (j) n and (j> n , 
and the variational problem is solved. The solution is 
given as a solution of a generalized eigenvalue problem 
for the equilibrium time-displaced correlation matrices of 
the quantities. 

We use the following functions as an approximate re¬ 
laxation mode: 


f p C(t 0 )f q =S p , q . (B14) 

The time correlation functions of relaxation modes are 
now written as 

(X p (t)X q ( 0)) ~ 6 Pt q exp(—Apt) . (B15) 

Because Xp and X q are approximate relaxation modes 
determined by solving for one value of r, the relation 
holds exactly for t = 0 and t = r, and is expected to 
hold approximately for other values of t > 0. 

By using Eq. (1A24I) , we choose 


X P = f, 


_ f T -r+to/2 


R 


and 


(B5) 


X P {Q) =X p {eQ). (B16) 

Then, the detailed balance condition (IA21I) leads to 


= Peq7 p T e- rto/2 /«Pe q (B6) 

as the pth trial functions for 4> n (Q) and 4> n (Q ), respec¬ 
tively, where 


f P = Ef v (B17) 

with 


E = diag(ei, El, £i, £2, £2, £2, • • • , £2N, £2 N, £2jv)- (B 18 ) 


fp — (/p.li ■ ■ ■ j /p,6iv) 


(B7) Because the detailed balance condition also leads to 


and 

fp = (fp,ljp,2, ■ • ■, f P ,6N) T (B8) 

are variational parameters. Here, R is defined by 

R=(r' 1 T ,r' 2 T ,...,r' N T ,v' 1 T ,...,v' N T ) T (B9) 

with 

r'i = r t - (n) and v\ = t>j - (u,) , (BIO) 

where r,; and Vi are the coordinate and velocity vectors of 
the ith atom of the biopolymer, and (r,) and (vt) are the 
average coordinate and velocity vectors. The operators 
e -rho/2 anc j e -rto /2 are usec [ i 0 reduce the contributions 
of faster modes in R and RP eq , respectively, and the trial 
functions become better approximations as t q becomes 
larger. 


C(t) T = EC(t)E, (B19) 

it can be seen that the generalized eigenvalue problems 
(IB 111) and (IB12I) are equivalent. 

From the orthonormal condition, the inverse transfor¬ 
mation of Eqs. (IB5I) and (IB6I) is derived as 

6 AT—12 

e -rHo/ 2 R= g p x p (B20) 

P= 1 

and 

QN —12 

P~ 1 e- rt °/ 2 RP eq = J2 9 P X P (B21) 

P= 1 

with 

Q P = C(t 0 )f p and g p = C(t 0 ) T f p . (B22) 
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In Eqs. (IB20I) and (IB21I) . it is assumed that the trans¬ 
lational and rotational motions are removed. It follows 
from Eqs. (IB 171) and (IB 191) that 

g p = Eg p . (B23) 

By using Eqs. (IB20I) and (IB21I) . the equilibrium time- 
displaced correlation functions of R are given as 

6 AT—12 6 AT—12 

C(t) = (R(t)R T (0)) = Y E 9pg^(X P (t-to)X q m 

p= 1 9=1 

6N -12 

- E 9pS e_Apt ( B24 ) 

p= 1 

for t > to. Here, 

g p = e x r to/2 g p and ~g p = e^ to/2 g p . (B25) 

Note that 

9 P = e 9p■ (B26) 

Equation (IB24I) can be regarded as Eq. (IA2QI) with the 

mode expansion 

6 AT—12 6iV—12 

Y 9p X p and R - E 9pX p . (B27) 

p—1 p= 1 

The RMA method explained above can be summarized 
as follows. The approximate relaxation modes X p and 
X q , which are specified by f p and f p = Ef p through Eqs. 
(|B5|) and (IB6|) , and the corresponding relaxation rate X p 
are determined by solving the eigenvalue problem ([Blip 
or (|B12|) . which are equivalent, with the condition (|B14|) . 
where the equilibrium time-displaced correlation matrix 
C(t) is defined by (|B13|) . In terms of the approximate 
relaxation modes and rates, C(t) with t > to can be 
calculated from Eq. (|B24|) . where g p and g p = Eg p are 
given by Eq. (|B25|) . By comparing C(t) calculated in the 
manner described above with that directly calculated by 
simulations, the accuracy of the approximate relaxation 
modes and rates can be examined. 

In this paper, we deal with position coordinates only, 
for which a = 1. Therefore, the following relations hold: 
Xp = X p , f p = f p , g p = g p , g p = g p , and C (£) = C(t) . 
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FIGURE CAPTIONS: 
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time ( ns) 


FIG. 1. The time series of RMSD (A) from the native struc¬ 
ture of chignolin near a transition temperature. There are 
many transitions among the native-like structures (RMSD ~ 
1 A), misfolded structures (RMSD ~ 2 A), and unfolded struc¬ 
tures (RMSD ~ 5 A). 
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(C) 
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FIG. 2. The free-energy surfaces of (a) and $ 2 , (b) $2 
and $ 3 , and (c) $1 and <£ 3 . 
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FIG. 3. The relations between RMSD and (a) 'hi, (b) <f>2, or 
(c) $3. 



t(ps) 


FIG. 4. The time-displaced autocorrelation functions 
($ p (t)$ p (0)) of % {p = 1, • • • , 9) for PCA. 
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FIG. 5 . The free-energy surfaces of (a) Y\ and Y2, and (b) Y2 
and >3 for RMA 



FIG. 6. The relations between RMSD and (a) Yi, (b) Y2, or 
(c) 13 (c). 
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FIG. 7. The time-displaced autocorrelation functions 
(Y p (t)Y p { 0)> of Y p , (p= 1, • • • ,9) for RMA. 



0 2 4 6 8 10 12 14 

D(Asp3N-Gly70) (A) 

FIG. 8. The free-energy surface along the distances between 
the amide nitrogen atom of Asp3 and the carbonyl oxygen 
atom of Thr8, D(Asp3N-Thr80), and between the amide ni¬ 
trogen atom of Asp3 and the carbonyl oxygen atom of Gly7, 
D(Asp3N-Gly70). 
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FIG. 9. Structures of the (a) native state, (b) misfolded state, 
(c) intermediate state, and (d) unfolded state (Multimedia 
view). The structures are fitted on the backbone atoms from 
Asp3 to Glu5. The movies of the snapshots are made using 
Visual Molecular Dynmics (VMD)^. The residues of Glyl 
and GlylO are shown in red and cyan, respectively. Some 
characteristic hydrogen bonds obtained by the Hbonds plugin 
of VMD with the cut-off distance of 3.0 A and the cut-off 
angle of 30 degrees are also shown. Some important oxygen 
and nitrogen atoms of the backbone are shown as red and 
blue spheres, respectively. 
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FIG. 10. The distributions for the native state (red), mis- 
folded state (green), and intermediate state (blue) in the 
planes of (a) Ti and $2, (b) <1?2 and 'Fa, and (c) <Fi and 
"Fa- 
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FIG. 11. (a) The free-energy surface of Y\ and Y 2 show¬ 

ing three regions for the native, misfolded, and intermediate 
structures listed in Table II (in orange) and the dividing lines 
for the four states of the Markov state models and the Markov 
state relaxation mode analysis model (in blue), (b) Schematic 
drawing of the free-energy differences between the intermedi¬ 
ate stae and the other three states (native, misfolded, and 
unfolded states). The contour lines are drawn in the range 
from 3 to 8 at an interval of 0.25. The values in (b) are 
RT x A F, where A F is the difference of the free energy given 
by Eq. (1191 and RT is 0.89 kcal/mol at T = 450 K. The free 
energy differences are approximately estimated by counting 
the numbers of contour lines. 
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( a ) Tau (ps) 




(c) 


1000 2000 3000 4000 5000 
Tau (ps) 


FIG. 12. The relaxation times of the (a) second, (b) third, 
and (c) fourth relaxation modes obtained by MSRMA as a 
function of time interval t. The lines of to = 0 ps correspond 
to the results of a simple Markov state model. 
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(a) t 0 = 0 ps and r = 1000 ps 





(c) t 0 = 10 ps and r = 1000 ps (d) t 0 = 10 ps and r = 3000 ps 



FIG. 13. The normalized time-displaced autocorrelation func¬ 
tions of Eq. (1321) calculated directly (Sim) and repro¬ 

duced by Eq. m ■ The values of to = 0 and r in picoseconds 
are (a) 0 and 1000, (b) 0 and 3000, (c) 10 and 1000, (d) 10 
and 3000, (e) 200 and 100, and (f) 500 and 100, respectively. 
The results of to = 0 ps correspond to those of the simple 
Markov state model. 
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(g) 






Fig. SI (Supplemental Material): The time-displaced 
autocorrelation functions of x,y,z-coordinates for ith C a 
atom (i = 1, • • • , 10) obtained by the simulation directly 
(Sim) and reproduced by RMA (RMA). 
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Fig. S2-1 (Supplemental Material): The 

ramachandran plots of each residue from Tyr2 to Trp9 (from 
(a) to (h)) of the nateve state. 
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Fig. S2-2 (Supplemental Material): The 

ramachandran plots of each residue from Tyr2 to Trp9 (from 
(a) to (h)) of the misfolded state. 
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Fig. S2-3 (Supplemental Material): The 

ramachandran plots of each residue from Tyr2 to Trp9 (from 
(a) to (h)) of the intermediate state. 
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Fig. S2-4 (Supplemental Material): The 

ramachandran plots of each residue from Tyr2 to Trp9 (from 
(a) to (h)) of the unfolded state. 
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Fig. S3 (Supplemental Material): The time series of 
the values of RMSD from the native structure for the native, 
misfolded, intermediate, and unfolded states listed in Table 
II. The time series of all steps (a), from 200 ns to 400 ns (b), 
from 400 ns to 600 ns (c), and from 600 ns to 800 ns (d) are 
shown. 




















